Research on Extraction Methods of Web Page’s Document Logical Structure

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure

We present a system description of the WINGNUS team work1 for the SemEval2010 task #5 Automatic Keyphrase Extraction from Scientific Articles. A key feature of our system is that it utilizes an inferred document logical structure in our candidate identification process, to limit the number of phrases in the candidate list, while maintaining its coverage of important phrases. Our top performing ...

متن کامل

Information Extraction from HTML Documents Based on Logical Document Structure

The World Wide Web presents the largest Internet source of information from a broad range of areas. The web documents are mostly written in the Hypertext Markup Language (HTML) that doesn’t contain any means for semantic description of the content and thus the contained information cannot be processed directly. Current approaches for the information extraction from HTML are mostly based on wrap...

متن کامل

Analysing the visual complexity of web pages using document structure

The perception of the visual complexity of World Wide Web (Web) pages is a topic of significant interest. Previous work has examined the relationship between complexity and various aspects of presentation, including font styles, colours and images, but automatically quantifying this dimension of a Web page at the level of the document remains a challenge. In this paper we demonstrate that areas...

متن کامل

Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages

The World Wide Web is a distributed, heterogeneous and semi-structured information space. With the growth of available data, retrieving interesting information is becoming quite difficult and classical search engines give often very poor results. The Web is changing very quickly, and search engines mainly use old and well-known IR techniques. One of the main problems is the lack of explicit HTM...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Technology Journal

سال: 2013

ISSN: 1812-5638

DOI: 10.3923/itj.2014.69.77